Wang Haihua
🍈 🍉🍊 🍋 🍌
分类是将给定的数据集分类为类的过程。在机器学习(ML)中,我们训练模型,然后衡量它的表现,通过使用一些损失函数函数来改进它。 但是我们如何衡量它的表现呢?有哪些特别的特性要看吗? 一个容易想到的方法是将实际值与预测值进行比较,但单纯的用预测对的样本量与总预测样本量作比,得到准确率(accuracy rate)并不全面。
假设一个数据集中百分之九十以上的数据是0,其他数据为1,我们即使不做任何模型,遇到任何特征数据都将结果预测(瞎猜)为0,那么我们的这个预测也可以达到90%以上的准确率。可见单纯使用准确率并不能揭示
评估分类器性能的一个更好的方法是查看混淆矩阵。一般的想法是计算A类的实例被分类为B类的次数。
混淆矩阵中的每一行表示一个实际的类,而每一列表示一个预测的类。混淆矩阵提供了很多信息,但有时我们需要一些指标来进行比较。
精度 $$precision= (TP) / (TP+FP)$$ TP是真阳性的数量,FP是假阳性的数量。拥有完美精确度的一个简单方法便是做出一个正面预测并确保它是正确的(精确度= 1/1 = 100%)。但这样也会有问题,因为分类器会倾向忽略除一个正实例之外的所有实例。
召回率 $$recall= (TP) / (TP+FN)$$
假设对于一个分类器,它的精确率为72.9%,召回率为75.6%。那么如何将两个评价标准进行综合呢?
我们可以用到 F1分数 F1分数是精确率(查准率)和召回率(查全率)的调和平均值
$$F1-score = \frac{2}{\frac{1}{precision}+\frac{1}{recall}}$$F1评分倾向于具有相似精确度和召回率的分类器。不过在某些情况下,你最关心的是精确度,而在其他情况下,你真正关心的是召回率。举个例子,如果你训练一个分类器来检测视频,对孩子是安全的,你可能会喜欢一个分类器,拒绝了很多好的视频(低召回),但仅保留安全(高精度),而不是一个分类器,有一个更高的召回率但让一些可怕的视频出现在你的产品(在这种情况下,您甚至可能想添加一个人工管道来检查分类器的视频选择)。另一方面,假设您训练分类器在监视图像上检测入店行窃者:如果您的分类器只有30%的精度,但它有99%的召回,这可能没有问题(当然,保安会得到一些错误的警报,但几乎所有的入店行窃者都会被抓住)。
能否同时提高精确率与召回率能?很不幸,两者不可兼得:提高精确度会降低召回率,反之亦然。
参考资料
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
# Creating independent and dependent variables.
X, y = mnist['data'], mnist['target']
# Splitting the data into training set and test set.
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
"""
The training set is already shuffled for us, which is good as this guarantees that all
cross-validation folds will be similar.
"""
# Training a binary classifier.
y_train_5 = (y_train == 5) # True for all 5s, False for all other digits.
y_test_5 = (y_test == 5)
"""
Building a dumb classifier that just classifies every single image in the “not-5” class.
"""
from sklearn.model_selection import cross_val_score
from sklearn.base import BaseEstimator
class Never5Classifier(BaseEstimator):
def fit(self, X, y=None):
pass
def predict(self, X):
return np.zeros((len(X), 1), dtype=bool)
never_5_clf = Never5Classifier()
cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")
array([1., 1., 1.])
from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(never_5_clf, X_train, y_train_5, cv=3)
"""
You could make predictions on the test set, but use the test set only at the very end of your project, once you have a classifier that you are ready to launch.
"""
# Constructing the confusion matrix.
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train_5, y_train_pred)
array([[60000]], dtype=int64)
# Finding precision and recall
from sklearn.metrics import precision_score, recall_score
precision_score(y_train_5, y_train_pred)
recall_score(y_train_5, y_train_pred)
D:\software_install\anaconda\lib\site-packages\sklearn\metrics\_classification.py:1308: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) D:\software_install\anaconda\lib\site-packages\sklearn\metrics\_classification.py:1308: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
0.0
# To compute the F1 score, simply call the f1_score() function:
from sklearn.metrics import f1_score
f1_score(y_train_5, y_train_pred)
D:\software_install\anaconda\lib\site-packages\sklearn\metrics\_classification.py:1570: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
0.0